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Abstract 

We quantify the influence of the topology of a transcriptional regulatory network 
on its ability to process environmental signals. By posing the problem in terms of 
information theory, we may do this without specifying the function performed by 
the network. Specifically, we study the maximum mutual information between the 
input (chemical) signal and the output (genetic) response attainable by the network 
in the context of an analytic model of particle number fluctuations. We perform this 
analysis for all biochemical circuits, including various feedback loops, that can be 
built out of 3 chemical species, each under the control of one regulator. We find that 
a generic network, constrained to low molecule numbers and reasonable response 
times, can transduce more information than a simple binary switch and, in fact, 
manages to achieve close to the optimal information transmission fidelity. These 
high-information solutions are robust to tenfold changes in most of the networks' 
biochemical parameters; moreover they are easier to achieve in networks containing 
cycles with an odd number of negative regulators (overall negative feedback) due to 
their decreased molecular noise (a result which we derive analytically). Finally, we 
demonstrate that a single circuit can support multiple high-information solutions. 
These findings suggest a potential resolution of the "cross-talk" dilemma as well 
as the previously unexplained observation that transcription factors which undergo 
proteolysis are more likely to be auto-repressive. 
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1 Introduction 



Genetic regulatory networks act as biochemical computing machines in cells, measuring, 
processing, and integrating inputs from the cellular and extracellular environment and 
producing appropriate outputs in the form of gene expression. The behavior of these 
networks is not deterministic; many of the molecules involved in genetic regulation (e.g., 
DNA, mRNA, transcription factors) are found in low copy numbers, and are thus sub- 
ject to severe copy number fluctuations. In living cells, the origins and consequences of 
stochasticity are well-studied 



30; 36; 43; 52; ft 



one can analyze propagation of 
noise through cellular network s (|46f) and disambiguate noise from different sources (e.g., 
intrinsic vs. extrinsic (ly; |49j; l6ll)). Surprisingly, cells function in the presence of noise 
remarkably well, often performing close to the physical limits imposed by the discreteness 
of the signals and the signal processing machinery fl 0). 

From an information-theoretic perspective (1581 ). noise intrinsic to the gene network 
presents an obstacle for signal transduction and biochemical computation: with too much 
noise, the information about the state of the environment (the signal) may be lost. While 
it may be possible to build stable biochemical switches even in the presence of just tens 
of copies of a transcription factor (jij), networks often need to exhibit a more detailed 
response. As an example, the well-studied p53 module responds to ionizing radiation in 



a "digital" manner (1351 ; 1371 ). initiating a number of disparate cellular responses, including 



cell cycle arrest, apoptosis, and induction of cellular differentiation, among others (1661 ) . 
The p53 module must not only transduce a simple binary answer (was there DNA damage 
or not?), but also more specific information (What was the damage? How severe? What 
should be done about it?) It is not evident that a few tens of molecules, whose abundance 
is subject to intrinsic copy number fluctuations, can successfully perform this task. Of 
note, a series of recent papers studying the effect of single allele loss in various tumor 
supressor genes, including p53, challenge the classic two-hit model of tumorigenesis (|32j) by 
demonstrating dosage-dependent modulation of phenotype (see (1171 ; l2ll ; 1281 ) and references 
therein) . 

The above example is just one of many instances of "cross-talk" -a perplexing dilemma 
observed across many cellular signaling systems in which a single noisy biomolecular 
species, presumably existing in just two states (active/inactive), is able to transmit com- 
plicated information. Perhaps the most well-studied example of cross-talk occurs in the 
protein signaling mitogen-activated protein kinase (MAPK) pathways. MAPK cascades 
transduce multiple stimuli from the environment into distinct genetic programs. Many of 
these signals are transmitted by common components (|39l ) . Specificity can be established 
by sequestration including cell type, subcellular localization, temporal, or with scaffold 
proteins 



39l ; l53l ; 1571 ). In some cases sequestration mechanisms are not available 



and specificity is achieved via signaling kinetics. For example, in mammal pheochromoc- 
tyoma cells, ligands triggering distinct programs (proliferate or differentiate) activate the 
same receptor tyrosine kinase pathway but with different amplitudes (1421) . In fact, by 
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increasing or decreasing receptor expression, the wrong program may be also be initiated 
(1551 ) implying that tight control of kinetics may be required and that uncontrolled noise 
may be disastrous. Cross-talk can be exploited by cancer cells to initiate uncontrolled 
cell growth (19) even in the presence of chemotherapeutic agents targeting individual 
signaling pathways. 

In this Chapter we present an information-theoretic measure of circuit quality which 
is independent of an assumed network's function and demonstrate that generic small net- 
works under biological constraints can transduce more information than a simple binary 
switch, often coming close to the optimal transmission fidelity, which we calculate nu- 
merically and analytically from physical constraints. In choosing a general information- 
theoretic quality measure, we obviate the problem of requiring prior knowledge of the 
function of the network (nj; 0; M, S H m H M), which is obviously network-specific 
and often unknown, and the related problem that a given network may perform multiple 
functions (13 It 1671 ; l69l ). We also demonstrate that the presence of an odd number of neg- 
ative regulators in a feedback loop confers an advantage to the circuit in terms of noise 
regulation and thus information transmission. Finally, we show that the ability to trans- 
duce information reliably is insensitive to most large (tenfold) deviations of a network's 
kinetic parameters. 



1.1 Measure of quality of a biochemical computation 

To motivate our approach, consider the experimental set-up of Guet et al. (jiil ). Probing 
experimentally the relationship between structure and function in transcriptional net- 
works, Guet and coworkers built a combinatorial library of 3-gene circuits and looked at 
the steady-state expression G of a reporter gene (GFP), coupled to one of the genes in 
the circuit, in response to four different chemical inputs C, namely two binary states of 
two different chemicals. The circuits acted as transducers, converting chemical signals 
into genetic response. They found that some networks could perform different behaviors 
(that is, behave as different logic gates), while others could achieve only one particular 
function. Of note, while some circuits responded differently to different inputs, for other 
circuits, the reporter expression did not depend on the chemical input state. The latter 
are clearly "broken circuits," transducing no information about the inputs. 

The actual number of GFP reporters in each cell clearly is not repeatable due to 
the stochastic nature of the involved cellular machinery. For this reason, the input- 
output relation for a circuit should be described in terms of the probability distribution 
P(c, g) = P(C = c,G = g), where c stands for particular chemical states, and g measures 
the number of reporter molecules. Then a natural measure of a circuit's quality is the 
mutual information between its inputs and outputs (1581 ) 

I(C,G) = J dcdgP(c,g)log p^ (1) 
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where log is taken with base 2. This dimensionless, nonmetric quantity measures in bits 
the extent to which C and G are dependent (complete independence implies p(C, G) = 
p(C)p(G), and thus I(C,G) = 0). The mutual information is bounded, < I(C, G) < 
min[ii"(C), H(G)], where H(X) is the entropy, H(X) = — Yj X p(. x ) \ogp(x). In (1251 ) . there 
were ||C|| =4 possible input states c G {1,2,3,4} = {q} and two possible output states, 
GFP on or off. For a circuit with a constant g, H(G) = 0, and then I(G,C) = 0. At 
the other extreme, if the reporter gene is on for exactly two of the four equiprobable 
chemical inputs, then each reporter state has p = 1/2, and I(C, G) = 1 bit. Similarly, for 
multinomial distributions of g, the mutual information seamlessly takes into account all 
possible relations between g and c. 

A crucial advantage in adopting mutual information as a quality measure is that it 
can be evaluated independently of the function of the circuit. For steady state responses 
considered here, the only reasonable way to define a qualitative function of the circuit, or 
to characterize the computation performed by it, is to consider how (g{ci)) are ordered. 
As long as all ||C|| responses are sufficiently resolved, the mutual information will be 
~ log ||C||, irrespective of the ordering. Thus the mutual information-based circuit qual- 
ity measure is insensitive to the type of computation performed by the circuit, but only 
to whether the computation assigns a different output to each input. Furthermore, due to 
the data processing inequality (1581 ) . high I(C, G) is a sufficient condition for a high-quality 
realization of any computational function that depends (stochastically or deterministi- 
cally) on P(c,g). High I(C,G) is especially important for sensory stages in biochemical 
signaling, where the same biomolecular species may control responses of many different 
biochemical modules, requiring high quality information about many different properties 
of the signal at the same time. 



1.2 Proposal 

We propose to investigate how the topology of a regulatory circuit affects its computational 
and information transmission properties, as measured by the steady state signal-response 



mutual information, Eq. (CD). While the results of (1251 ) may be interpreted as revealing 
that some circuits may perform better than others, this effect can be caused in part 
by operating at suboptimal kinetic parameters, some of which are biologically easy to 
adjust to improve the information transmission fidelity. In fact in (jiil ). several networks 
identical in topology but different in their kinetic properties, performed markedly different 
functions. To avoid the problem, we study instead the maximum mutual information 
attainable by the circuit under realistic conditions. Specifically, for a regulatory circuit 
t, with a set of kinetic parameters = {$i, $2> • • • }, which responds to inducer (input) 
concentrations C = {q} = {c±,C2, ■ ■ ■} with different genetic (output) expression levels 
P(g\c, t9), we propose to investigate numerically tfj = argmax^ I*(C, G). 



As in (I25l ) . we limit ourselves to 3-gene topologies where each gene is regulated by 



exactly one transcription factor (see Tables H] and [5] for the list of these topologies, and 
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Sec. 12.11 for their more detailed description). We measure the output of the circuit in 
terms of the steady-state expression of the reporter gene, which is always downregulated 
by another gene denoted Z (see Tables H] and [5]) . This limits us to 24 possible circuits, 
cf. Sec. 12.11 The kinetics associated with these topologies are described in Sec. 12.21 Note 
in particular that even though we use the genetic regulation terminology throughout the 
paper, the kinetic model is general enough to account for protein signaling and other 
regulatory mechanisms as well. 

For each of the chosen topologies, we need to find stable fixed points of the dynamical 
systems that describe the circuit, cf. Sec. 12.31 evaluate the fluctuations around them, 
P(g\c), estimate the corresponding mutual information, and then optimize it with respect 
to the kinetic parameters. Note that all of the parameters of the system that we treat 
as variable can, in fact, be easily adjusted by the cell over its lifetime by means of many 
biological mechanisms, cf. Sec. 12.21 

Rather than discretizing the reporter output, as in (pa l, we take into account the 
actual numbers of the reporter molecules. Assuming mesoscopic copy numbers, we use 
the linear noise approximation, cf. Sec. 12.41 to derive the reporter gene distribution as a 
sum of Gaussians with means at the stable fixed points. Under this assumption, the mu- 
tual information between the two random variables, C representing the discrete chemical 
(input) states, and G measuring the continuous reporter expression (output), is 

HC, G) = ±-YY dgU{gt of) log - M ■ ( 2 ) 

c=l i=l J MM l^d=l Z^k=l ■ /V U/fc; °k) 

Here M is the total number of fixed point calculations performed for the circuit, and M c 
is the number of those done with C = c; Af(gf, of) denotes the output response for the 
i'th calculation with C = c, which is a Gaussian distribution with mean g\ and variance 
(of) 2 . Many calculations at each point in the parameter space, are needed to explore 
multiple stable fixed points of the dynamical system (see Sec. 12.31) . Finally, we choose 
each chemical state with equal probability, P(c) = const. 

When optimizing Eq. ([2]) with respect to # (see Sec. 12.61) . we need to consider two 
computationally trivial (and biologically unrealistic) ways of achieving high J(C, G). First, 
given discrete c and an infinite range of g, achieving the upper bound I(C, G) = H(C) 
is easy: as the number of molecules of the reporter g c increases, the magnitude of its 
fluctuations, as measured by its standard deviation a c , grows slower as o c ~ \fg E -, so the 
responses to all c's can be well-separated if we allow for an infinite number of molecules. 
This, however, is not biologically relevant, since producing many copies of a molecule 
takes time and energy, both of which are limited. In fact, here we are interested only 
in solutions that involve low copy numbers as this is precisely the regime in which gene 
regulatory networks function. We note also that many apparently deterministic, high 
copy number systems may actually fall into this regime if the threshold of the system can 
be overcome with only a few molecules (JqI; 29; 41; 56). 
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Second, and perhaps less obvious, a trivial solution can also be obtained if we allow 
for multi-scale {stiff) systems. For example, if the response time of the reporter tq is very 
large relative to the upstream regulators tz, then all of the noisy upstream fluctuations 
will be filtered (0; @): effectively, the reporter takes NzTq/tz ^> 1 samples of Z per its 
response time (here Nz is the mean number of Z molecules), and fluctuations are small. 
However, living cells must respond in a timely manner to changes in the environment, so 
infinite response times are also not biologically relevant. 

These observations suggest that our objective function to be maximized requires some 
biologically reasonable constraints. For this reason, we have investigated many different 
realizations of the constraints, and, instead of maximizing the mutual information, we 
chose to maximize the following constrained mutual information 

L = I{G,C)-\{N)-<y{q), (3) 

where A and 7 are chosen such that the average number of molecules of all of the com- 
ponents in the system (N) = ngpf N Sc=i Y^i^i Sj=i ^ij (where ]V?. is the average 
number of molecules of species i for fixed point j given C = c and N s = 4 is the total 
number of species in the system) is on the order of 100, and the average stiffness of the 
system (q) = Ylii r il r G (where are the decay rates of the transcription factors, and r G 
is the decay rate of the reporter) is approximately 3 orders of magnitude. 



2 Methods 
2.1 Topologies 

As in the experimental set-up in (liil ). we consider 3- node circuits, in which genes are regu- 
lated by exactly one gene (including the possibility of auto-regulation). This also reduces 
the assumptions we would otherwise need to make about the dynamics associated with 
combinatorial regulation. The 3 genes (X, Y, and Z) in each circuit are interconnected 
by exactly 3 edges. There are only 3 such non-redundant topological structures, which, 
when we include the possibility of either excitatory or inhibitory interactions, results in 
2 3 = 8 possible configurations per structure, for a total of 24 topologies (see Tables @] and 



[5]). The fourth (reporter) gene G is always down-regulated by Z, as in (1251) . Extensions 
to other topology classes are easily implemented. 



2.2 Model and parameters 

The dynamics of transcription and translation have been modeled with remarkable success 
for small circuits by avoiding the translation step completely and coup ling the genes to 
each other directly by means of simple rational functions otj (15; 2(J 26). In general, each 
of the species X = {X, Y, Z, G} in the circuit is subject to a degradation and a creation 
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process 



X ^ 0, (4) 
^ X. (5) 

The macroscopic, deterministic description of the system is governed by the system of 
ordinary differential equations 

d<j)j/dt = -rj4>j + aj(4> nj ), (6) 

where {0i, . . . , 4>n} is the concentration vector of the N chemical components, rj is the 
degradation rate of <pj, and aj is a production term that depends on the concentration of a 
regulator (parent) molecule of j, namely iTj. The production is modeled as a constitutive 
expression (the leak) plus a Hill activation or inhibition, 

a(4>) = a + a Kn + {(f)/Sir (7) 

or 

K n 

a((f>) = do + a 



K n + (<p/ Si ) nl 

where ao describes the leakiness of the promoter, a specifies its dynamic range, K is the 
concentration of the regulator at half-saturation (the Michaelis constant), n is the Hill 
coefficient, and Sj is the modulating effect of the z'th input molecule on the regulator 
protein. Sj can be modeled equivalently by rescaling K. One can think of this as the 
chemical signal binding to the protein, changing its conformation, and influencing various 
affinities. The two equations correspond to an excitatory and inhibitory regulation, re- 
spectively. For this dynamics, there is no distinction between the protein and the mRNA 
of a gene species, and we use the terms interchangeably. As in (1251 ). we allow each input 
to take two binary states (either the input molecule is present or not). We have a total of 
3 inputs and 2 3 input states, and each input modulates the expression of one of the three 
transcription factors. For a chemical state c where an input molecule i is not present, we 
set Si — 1. We set the units of measurements such that volume Q of a cell is 1, so that 
concentration of 1 is equivalent to 1 molecule per cell. 
In all we have 

1. 3 decay rates rx, ry, fz corresponding to decay rates for the 3 transcription factors. 
We set tq corresponding to a response time of approximately a half hour. 

2. 4 Michaelis constants K x , Ky, K z , K G and 4 range parameters ax,a Y ,a z ,a G de- 
scribing the regulation function for each component of the circuit. 

3. 3 input parameters sx,sy,sz modulating the effect of each input on the 3 tran- 
scription factors 
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4. 1 leak parameter qq. 



For simplicity we assume n = 2. In total we have 15 parameters. 

Notice that all of these parameters can be easily adjusted by the cell by means of a 
variety of biological mechanisms, thus validating our proposal to study the dependence 
of the signal transduction, optimized with respect to the parameter values. Below is a 
non-exhaustive list of such regulatory mechanisms. 

1. All protein/mRNA decay rates can be adjusted independently of each other by 
microRNA expressions or by regulated proteolysis, such as using ubiquitin tagging. 

2. Michaelis constants depend on structural properties of proteins and the DNA, as well 
as on the abundance of the proteins near a DNA binding site compared to the overall 
protein concentration. Thus they can be adjusted by chromatin rearrangement, or 
by controlling the nuclear pore transport. 

3. Effects of chemical inputs on transcription factors depend on the chemical-protein 
affinity and on the abundance of the chemicals near the relevant proteins. The 
former can be changed by modulating chemical-protein binding reaction by means 
of expression of various enzymes, while the latter can be achieved by controlling 
transport processes. 

4. The leak depends on the concentration of the RNA polymerase, ribosome, as well 
as the DNA accessibility. All are easy to adjust in a living cell. 

2.3 Determining Stable Fixed Points 



All of our circuits incorporate some feedback mechanism (e.g., the "feedback dyad" (17ll )) 
and, therefore, may have multiple stable steady state solutions. We find these by numer- 
ically solving the macroscopic chemical kinetics system ([6]) describing the circuit using 
MATLAB's odel5s with the parameters as described in section l2~2l We randomly sample 
different initial conditions for the time-evolution to obtain a set of (almost all) fixed points 
for each chemical state and each topology. Additionally, since in vivo the system will be 
flipping between different input states, the steady-state solution of one input state is the 
potential initial condition for the time-evolution of the other inputs. To include these 
potential initial conditions, we first randomly choose 10 initial conditions for each and 
then we take the resulting stable solutions and use them as the initial conditions for each 

When a time-evolution of the system results in oscillations or chaotic behaviors, we 
neglect these solutions since, under our assumptions, they will result in multiple genetic 
outputs corresponding to the same chemical input and hence in a low mutual information. 
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2.4 Linear Noise Approximation (LNA) 

For excellent reviews and discussions of the linear noise technique, we refer the reader 



to (]13l : Il4r l45l : I64J ). Here we briefly review one particular formulation that simplifies the 
analysis. 

Given a system with volume Q and N different particles, we denote the particle con- 
centrations as (f) = {(pi, . . . ,4>n}, and the copy numbers as n = Q<fi. The state of the 
system is defined by n, and it changes when an elementary reaction j, j = 1, . . . , R takes 
place. When reaction j occurs, the copy number rii changes by Sy, which is the N x R 
stochiometric matrix. Then the evolution of the joint probability distribution P(n, t) is 
given by the following master equation 

d -^> = n^n^E-* - l)/,(0,fi)P(n,t), (9) 
J'=l 

where E~ Si: > is the step operator, which acts by removing SV,- molecules from rij, and fj is 
a rate for j. 

While this equation is usually mathematically intractable, a Monte Carlo algorithm 



exists to solve it numerically (the Gillespie algorithm) (1221; 1231). To generate a particular 
stochastic trajectory, this method draws random pairs (r, e) from the joint probability 
density function P(r,e\n), where r is the time to the next elementary reaction, and e 
is its index. Multiple trajectories allow to estimate the necessary moments of P(n,t). 
However, this approach is computationally intensive, and quickly becomes infeasible if 
one wants to explore multiple system parameterizations, or if fj span multiple scales. 

Alternatively, one can expand the master equation in orders of f2~ 1//2 . Introducing 
£, such that rij = VL<pi + fi 1 / 2 ^ and treating £ as continuous, the first two terms in the 
expansion yield the macroscopic rate and the linear Fokker-Plank equations, respectively: 

n° = i^-E^^ + ^EPU^. (ID 
at /-< d (l " k dtMt y ' 

i,k i,k 

where Aj k = Ylf=i an< i [B]ik = Ylf=i SijSkjfj(4>)- The steady-state solution of 

Eq. pip is a multivariate Gaussian 

P(0 = [(2tt) n det 5] ~ 1/2 exp ("^) , (12) 

where the covariance matrix 5 is given by the matrix Lyapunov equation 

AS + EA T + B = 0. (13) 
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This system is solved using the standard matrix Lyapunov equation solvers (MATLAB's 
lyap). In order to assess the validity of LNA for our system we compared the steady 
state solutions to multiple Gillespie runs. We found that, even at very low copy numbers 
(~ 10), LNA performed well as measured by the Jensen-Shannon divergence (see Sec. 12.51 
for details). Based on these results, we approximate the steady-state distribution as a 
sum of multivariate Gaussians with means at the stable fixed points of Eq. (fTOl) . and with 
covariances as in Eq. ffl3l) . 

We note that both the LNA and the Gillespie algorithm are derived assuming that the 
reactions j are truly elementary. In our system, a single particle creation, a, encapsulates 
all processes, starting from the protein-DNA binding and ending with the translation. 
Justification for using "elementary complex" reactions is provided in 
However, the complex nature of the reactions has acomparatively small influence on the 
low frequency components of thestochastic response (? ), which is our focus here. For 
this reason we believe that approximating terms in Eq. as elementary and using LNA 
is a less important approximation than merging transcription and translation into a single 
step. Generalizations to LNA with elementary reactions is straight-forward, provided the 
reaction system is known (which is more complicated). 



2.5 LNA Performance 

Here we quantify LNA accuracy as a function of the mean molecule numbers in the system. 
We approximate the steady-state probability distribution p\ using LNA and compare this 
to the steady-state distribution p g obtained with multiple trajectories of the Gillespie 
algorithm. As a measure of the similarity between the two probability distributions, we 
use the symmetric Jensen- Shannon divergence JSfi 

JSn\pi,Pg] = ^iDklIpiW^pi + 7r 2 p g )} + ^Dkl^W^iPi + 7r 2 p g )] (14) 

where D KL [pl\q] = J2 i Pi\og 2 (pi/qi) is the Kullback-Leibler divergence and we set tti = 
7r 2 = 1/2. We note that D^l > and in the case that the two distributions are equal 
pi = p g , then D KL = JSfi = 0. 

We calculated JSfi for multiple circuits and multiple parameterizations and found 
excellent consistency between LNA and Gillespie (see Fig. [1]). Below 10 molecules, the 
two distributions were easily distinguishable, but above 10 molecules we consistently found 
large overlap. 



2.6 Optimization 

We employ a simplex optimization (using MATLAB's fminsearch) to maximize L = 
I(G,C) — X(N) — j(q) over the log 10 of the 15 parameters where A and 7 are chosen 
to accommodate biologically relevant molecule numbers and stiffness. For example, for 
an average of approximately 100 molecules for each transcription factor and a stiffness 
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Figure 1: Jensen-Shannon divergence JSu between distributions obtained by LNA and 
the Gillespie algorithm for multiple circuits and multiple parameterizations plotted as 
a function of mean copy number. At JSu = 0, the distributions are identical. There 
appears to be a sharp threshold at 10 molecules, below which the LNA does poorly but 
above which, LNA does well. 
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of order 3 we choose A = 0.01 and 7 = 0.001. To explore the parameter space for 
each topology, we uniformly randomly select biologically relevant starting points (protein 
half-life near 10 minutes, promoter leakiness near 0.01 proteins/sec, promoter range near 
10 proteins/sec, regulator at half-saturation near 100 proteins/sec, and input molecule 
modulation of regulator near 2). To make the search for maxima more efficient we only 
maximize random points that start already above a certain threshold (L > 0). 

2.7 Maximum mutual information for a fixed copy number 

Suppose a molecular species G with concentration g, f dgP(g)g = No is used as a reporter 
species for a cascade of biochemical computations, so that the species is not allowed to 
participate in any feedback loops. Then its stochasticity is limited from below by a Poisson 
noise. That is, if g is the deterministic value of g produced by some biochemical reaction 
kinetics, and g ^> 1, then 

g = g + v, (15) 
(u) = 0, (uv) = g. (16) 

Furthermore, g itself is distributed probabilistically according to P(g), J dgP(g)g = Nq, 
due to stochasticity of inputs to and of the internal dynamics of the biochemical system. 
We are interested in the maximum number of bits that can be transmitted reliably by 
this reporter species (that is, is its channel capacity) at fixed N G . 

Intuitively, the noise in this system is ~ y/N G ~ y/Nc, so the number of distinguishable 
states of the reporter is also ~ y/Ng, and one should be able to transmit about 1/2 log 2 Nq 
bits reliably. This argument has been used extensively (e.g., (Il0l)). However, it fails (a) to 
establish the correct constant of proportionality in front of the number of distinguishable 
states and (b) to take into the account the g dependence of the noise variance (which 
leads to a higher resolution at smaller g). Both of these effects are likely to contribute 
only 0(1) bits to the channel capacity, but, for Nq < 100 considered in this work, this 
might be an important correction. We are unaware of a prior derivation of the channel 
capacity for this system up to o(l), and we present it here. 

We write: 

I{G,G) = H{G)-H{G\G) = H{G)-H{G\G)+o(^-j (17) 

= - J dg P(g) log 2 P(g) - ~ j dgP(g) log 2 2neg + O (J^ . (18) 

Eq. [T7] is valid if var (G\G) ~ Nq var (G) ~ N Q , and Eq. [TS] holds for a Poisson noise 
in the reporter. 

To find the channel capacity of the reporter species, we maximize I(G, G) with respect 
to P(g) subject to 

J dgP{g) = 1, J dgP(g)g = N G . (19) 
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This results in 
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-9/2^,5 



a/2 



(20) 



{2ngN G) 

where ~ is due to the approximation involved in replacing H(G) by H(G). Plugging 
P(G) into the equation for I, we get the channel capacity 



I (G,G) 



dgP(g) 



\ log 2 27cN g + log 2 
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+ 0(N^] 



(21) 



log 2 N G + 0(N 



G 



(22) 

Thus, for the optimal distribution of inputs, as in Eq. [201 the naive estimate of Iq = 
l/21og 2 Nq for a biochemical reporter is correct up to terms non- vanishing with Nq 1 . 
For the distribution of inputs analyzed in this work (up to 8 discrete input states), the 
maximum possible I(G, G) is clearly less than this channel capacity. One can obtain the 
maximum information for such input distributions by numerical optimization of I with 
respect to the values of the g input states, assuming a Poisson distribution of g around g. 



2.8 Network noise analysis using LNA 

Consider a regulatory network of N transcription factors indexed by i G {1,2, .. . , N}. 
The macroscopic behavior of the system is determined by 

01 = A(0i, • • • An) 

02 = / 2 (01, • • • ,07V) 



/tv(0i, ...,<Pn) 



where <f>{ is the concentration of the z'th transcription factor. Let n = Q<p be the vector of 
molecule copy numbers with volume Q. Using LNA (1131 ). we can calculate the covariance 
matrix C = ((n — (n))(n — (n)) T ) = EQ by solving Eq. [J3J 

Bu = (23) 




(24) 



This suggests that the topology or structure of the network can also play a role in con- 
trolling noise. Specifically, the variance of the i'th transcription factor Cu can be reduced 
by decreasing the product f^CV/ < 0, where j G 7Tj. 
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The covariance Cij is a more complicated function of the other covariances: 

^ = -fet c * + ^t c ") (26) 

(9fi &f' 9fi 9f 1 

"aT^v + 'aT^'ij + / . Tr^'fc + 7 , TrrCik , (27) 

^ = m s « + £ E H Qfc • (28) 

9</>i ^ 90j V k^i,k£^i k^j^e-Kj ^ K J 

If j G 7Tj, then from Eq. [28] we see that is a function of the covariances between % 
and the regulators of its regulators (Cik, where k E ttj, and j G 7Tj). We can write these 
covariances in turn as functions of covariances between i and the regulators of regulators 
of regulators of i, and so on. This implies a recursion which will end when we either reach 
a regulator which has no other regulators or, in the case of a cycle, we reach i again. 

In the latter case, the recursion will end back with Cu, and the last term in Eq. 
will have the form 

c - n <») 

j 6 cycle J 

Since Cu > 0, this implies that one way to reduce (and hence Cu itself) is to have 
the product in Eq. [2H] be negative. Crucially, the only way to achieve this is if the cycle 
contains an odd number of negative regulators. 



2.9 Some simple examples of sub-Poisson noise 

The transcription factors in the network may participate in various feedback loops. In 
some cases, this allows the usual Poisson noise lower bound to be overcome, resulting in a 
sub-Poisson noise (Cu < fa). Below we give some simple examples for l-,2-,and 3-cycles. 

The set-up of (12 5l). which we use in this work, simplifies the analysis since we only 
consider one promoter transcription factors, so that fi = — rifa + oti((j) ni ), where 7Tj includes 
just one gene. In steady-state, fa = ajj/Vj. Finally, all of our reactions are enzymatic, 
so the diffusion matrix B will only have diagonal nonzero elements. Then, since Bu = 
rifa + «i, we use the expression for fa to find Bu = 2vifa. 



2.9.1 Auto-repression 

For the auto- repressive case there are no covariance terms and = —ri + a' { , so we can 
rewrite 

C u = (30) 
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Auto-repression implies a' i < 0. Thus Cu < (pi, resulting in a sub-Poisson noise. A similar 
derivation using LNA is given for regulated degradation in (jlil ) and regulated synthesis 



in ((62) 



2.9.2 A 2-cycle 

In this case, 7Tj = i — 1, and 7Tj_i = i. Assuming no auto-regulation, let 3^ = and 
^a'r 1 = a', i . Now we write, 

d<t>i t ~ i 

Cu = (pi H a-C M _i, (31) 



r 



Cj j_i = — ■ (a'fii-i i-i + ot^iCji) . (32) 

n + ri_ x 

To reduce Cu we can reduce the magnitude of C^-i. One way to achieve this is 
to have opposite signs for and a'i_x- Moreover, the sub-Poisson noise is possible if 
a'tCij-i < 0, which is possible only if a^a'^ < 0. Thus the presence of a negative and 
positive regulator in a 2-cycle is a necessary, but not sufficient condition for achieving 
sub-Poisson noise. For sufficiency, we also need |a4Cj-i,i-i| < |«i_iCn|- 

2.9.3 A 3-cycle 

In this case, 7Tj = % — 1, 7r^_ 1 = i — 2, and 7Tj_ 2 = The variance equation stays the same 

Cu = (pi + — a-Ci (i _i. (33) 



However, now we have 



faJCi-Li-i + cd^M-a) • (34) 



Cii-i — ■ 

n + 

and 

Ci,i-2 = — : ( a-C i _i ii _ 2 + oi^Cii ) . (35) 

n + n- 2 



Combining the above into a single expression for Cu, we have 



Cu = (pi H -, — { <\] 

rAn + ri_i 



• (36) 



The last term gives us a product of the derivatives, a i a;j„ 1 Q; i _ 2 . If this product is negative 
(that is, if we have an odd number of repressors in the cycle) then we can reduce the 
overall magnitude of the variance Cu. Note here that we have two extra terms in the 
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variance. One, (a^Cj-i^-i, is always positive, while the other can be of either sign. 
Thus the overall negative regulation is not a guarantee of a sub-Poisson noise in this case. 

Ultimately, noise regulation can be improved with cycles with odd number of negative 
regulators. However, as cycles get larger and the network becomes more complex, the 
achievability of sub-Poisson noise becomes more limited. This may be related to the 
observation that, whereas small cycles are over-represented in a metabolic network, large 
cycles occur less frequently than one would expect given several different possible null 
models (@). 

3 Results 

3.1 Transmitting more than 1 bit at low copy number 

We tested the ability of each of the 24 different circuits to reliably transduce input signals. 
For each circuit, we numerically optimized Eq. [3] at different A and 7. The results of a 
single optimization thus give us a local maximizer #*(A, 7) of L. For each numerically 
obtained we then plot the corresponding mutual information 7* (as calculated by 
Eq. [2]) as a function of the actually observed average number of reporter molecules (Nq) = 
Ijgp? Sl=i Sf=i 9i- For example, in Fig. [2] we show the results of multiple maximizations 
for two typical circuits. Each point on the plot corresponds to a $*(A, 7). The blue 
squares and the red diamonds correspond to the two different 7 values and the solid lines 
correspond to the "best" solutions which we determine by finding the convex hull of the 
set of all maxima. 

Not surprisingly, as the A constraint is weakened, and higher molecule numbers are 
allowed, more information is transduced on average (blue and red curves always increase 
monotonically) , Similarly, as the 7 constraint is increased, and the stiff solutions are 
constrained, less information is transduced (red curve is always less than or equal to blue 
curve). We report that all 24 topologies can pass more than 1 bit of information with 
molecule numbers far smaller than 100. In fact, at 25 molecules, most circuits can pass 
nearly 2 bits of information. In short, generic topologies under biological constraints of 
response time and molecule numbers can still transduce more information than a simple 
binary switch. 

3.2 Determining optimal bounds 

To determine how well the circuits performed compared to the optimal solution, we first 
note that all solutions are upper bounded by the entropy of the input distribution, which 
in our case is H(C) = 3 bits. Next, recall that the reporter protein, G, must be at least 
subject to its own intrinsic noise, and the variance of this noise must be at least that of 
a Poisson distribution (p(x) = exp(— X)X x /x\) with mean A = g c (since the reporter does 
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Figure 2: We ran multiple optimizations = argmax^ L. For each optimization run, 
we plot the mutual information I* = I(C, vs. the mean number of molecules 

of the reporter protein (Nq). Input distribution p(c) = 1/||C|| and ||C|| = 8 so that 
I(C, G) < H(C) = 3 bits. Blue squares and red triangles are for 7 = 0.001 and 7 = 0.01, 
respectively. The blue and red linearly interpolated lines correspond to the convex hull 
for each respective 7 value. The black solid curve gives the numerically evaluated optimal 
bound (cf. Sec. 13.31) and dashed curve gives analytic bound for any input distribution 
(cf. Sec. 12. 7p . Inset: (/) as a function of a fraction of data m included in the analysis 
(cf. Sec. 13.31) . Blue and red correspond to two different 7 values. Linear regression 
extrapolated to case of infinite data (y- intercept). Results for two typical circuits with 
1-cycles: (a) Circuit 19 with odd number of negative regulators in cycle and (b) Circuit 
11 with even number of negative regulators in cycle. Note here as in Fig. [3] and Fig. H] 
circuits on left have larger (/) values as well as smaller differences between the two 7 
values than circuits on right. 
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Figure 3: Same as in Fig. [2]for two circuits with 2 cycles: (a) Circuit 23 with odd number 
of negative regulators in cycle and (b) Circuit 5 with even number of negative regulators 
in cycle. 




Figure 4: Same as in Fig. [2]for two circuits with 3 cycles: (a) Circuit 13 with odd number 
of negative regulators in cycle and (b) Circuit 17 with even number of negative regulators 
in cycle. 
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not have any feedback) (1451 ) . Given this noise lower bound and a probability distribution 
over inputs C (in this case a uniform distribution of 8 delta functions), we can numerically 
calculate an optimal transduction curve. That is, we optimize 

L = I(C,G)-X(N G ), (37) 

where (Nq) = z2c=i 9° an< ^ we denote g c to emphasize that the means g c are optimally 
separated and are described by a Poisson distribution. For different values of A, we can 
define an optimal curve / vs. (Nq), where I is the mutual information at the maximum 
of L. All 24 topologies are upper bounded by the resulting curve, since the reporter gene 
must be at least subject to its own g c noise plus any noise translated from upstream factors. 
Finally we note that / is always bounded by the channel capacity Iq which is defined to 
be the maximum over all input distributions and can be approximated analytically as in 
Eq. [22] (see section EI}. For (N G ) = 25 molecules, J = 2.32 bits and for (N G ) = 100 
molecules Iq = 3.32 bits. 



3.3 (Almost) optimal circuits 

We find that all 24 circuits are able to achieve close to the optimal transmission fidelity, 
implying that they are able to tune the noise from the upstream factors to almost negligible 
values (see Figs. [2]|3] and Table [3]). To quantify how well the circuits perform compared 
to the optimal bound and to each other, we define the statistic 

(/> = — !— r / I(x) dx, (38) 
\b-a\ J a 

where I is the linearly interpolated convex hull and a and b are set to 25 and 120 molecules, 
respectively. Note that for our discrete input distribution we can upper bound (J) < 2.75 
bits, where we use the linearly interpolated curve derived numerically J, as described in 
Sec. I3.2( similarly, for any input distribution we can upper bound (I) < 3.03 bits, where 
we use the analytic approximation I derived in section 12.71 

Since the convex hull area can only grow with the number of optimizations we run, 
there is a bias in our calculated statistic (/). That is, with k optimization runs, > 
(I)k-i- We are interested in (/) = (J)oo, but this is clearly unattainable. Moreover, 
for different topologies, (/) may be approached with different speeds as a function of k, 
making comparisons between topologies suspect. We use jackknifing to estimate the bias. 



That is, in the spirit of (1601 ; I63T ). instead of the total number of optimization runs A^ opt , 



we use only N opt /m of them to calculate (I). Then one can estimate (I)oo by fitting 

(J(m)> = </> + - + ^§ + ---- (39) 

where Ai are some constants. In the insets of Figs. [2]|4]we show the dependence of (/) on 
m, the inverse fraction of data included. We see that for the most part (I(m)) is well fit 
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by a straight line, and contributions from the higher order corrections are insignificant. 
The results of extrapolating (/) to m = for each circuit are reported in Tables H] and 
El The average (I) over all circuits is 2.48 ± 0.05 and 2.32 ± 0.09 bits for 7 = 0.001 and 
7 = 0.01 respectively. We find the circuits are within 10% of the optimal transduction 
capacity. 



3.4 Ranking circuits 

The optimality measure (/) provides a ranking of the topologies (see Tables H] and [5]) . 
While, strikingly, all of the circuits perform close to the optimal bound, systematic differ- 
ences are revealed. Consider for example the linear chains with autoregulation (circuits 
1, 19, 14, and 4 with negative feedback and circuits 21, 15, 16, and 11 with positive feed- 
back). We note that the negative feedback circuits all have higher (I) values than their 
positive feedback counterparts. Moreover, the difference (gap) between the 7 = 0.001 and 
7 = 0.01 curves is smaller for the negative feedback circuits than for the positive feed- 
back circuits. That is, even when the stiffness is constrained, these circuits still do well, 
whereas the other circuits are more reliant on stiff dynamics. These results are consistent 
with findings in (Q) that autorepressive circuits can help regulate noise. Interestingly, this 
trend can be generalized to the circuits with longer cycles as well. For example, we also 
find that for circuits with 2-cycles, those that perform best are those that have opposite 
regulations (one repressive, one activating) rather than two activating or two repressing 
regulators. For the case of 3-cycles, those circuits with 1 or 3 negative regulators have 
on average higher values of (/). In Figs. [20 we display curves for typical 1-, 2-, and 
3- cycles, respectively with both odd (left column) and even (right column) number of 
negative regulators. 

These findings imply that there are some structural constraints that impart small, 
but measurable limitations to the circuit's transduction capacity. In particular, those 
circuits with an odd number of regulators (an overall negative feedback) in their cycles 
are generally ranked higher than those circuits with an even number of regulators (an 
overall positive feedback), see Tables H] and [5j In Fig. [5] we show a bar graph of the values 
of (/) for the two classes of circuits (odd and even number of regulators in the cycle) for 
different 7 values and for different length cycles. The average mutual information for the 
circuits with an odd number of negative regulators is 2.51 ± 0.03 and 2.39 ± 0.05 for the 
two 7 values, whereas for the circuits with an even number of negative regulators it is 
2.44 ± .03 and 2.26 ± .05 for the two 7 values. Between the two classes, these values are 
more than one standard deviation apart. To test the significance of this observation, we 



perform the non-parametric Mann- Whitney U ( Wilcoxon) Test ( 140l ; |72| ) , which measures 
the difference in medians between two samples. We find for 7 = 0.001, U = 8 and 
p = 0.0002, and for 7 = 0.01, U = 10 and p = 0.0003. That is, the null hypothesis 
that the optimality measures for the two classes of circuits (odd and even number of 
regulators, or, alternatively, overall negative and positive feedback) are drawn from the 
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same distribution and, therefore, have the same medians, is highly unlikely. 



3.4.1 Noise analysis 

Since circuits containing cycles with an odd number of negative regulators are better signal 
transducers, we might expect that they are able to control the noise variance better. In 
fact, using LNA (cf. Sec. 12.4ft . we prove this assertion for a generic transcriptional network 
in Sec. 12.81 Furthermore, for simple networks we demonstrate that the overall negative 
feedback is a necessary and, in one case, even a sufficient condition to achieve sub-Poisson 
noise (variance less than the mean). 

For example, let d(j>i/dt — = —Tifa + ai((p ni ) describe the deterministic dynamics 
of gene i (see Sec. 12.21 for explanation of the notation). In a steady-state fa = a^^J/rj. 
Then, for a 1-cycle where 7Tj = i, Eq. [25] for a species variance reduces to 

Cu = (40) 

r 

where a' is the derivative of the gene expression function, and the above is evaluated at 
the deterministic steady state. In the case of an auto-repression, a' < 0, and the variance 
Cu is less than the mean (0; [l8|) . 

Similarly, Eq. [25] can be reduced for a 2-cycle, i, j = {1, 2} 

C u = fa^-a'Aa (41) 

' i 

c a = zr^zr(^ c n + a '^)- ( 42 ) 

i ' 3 



Since Cu > 0, here a necessary (but not sufficient) condition for sub-Poisson noise is 



a[a' 2 < 0. 



This analysis (as well as the derivation in Sec. I2.8j) also illustrates that it is easier to 
obtain smaller variance (and hence larger mutual information), for cycles of shorter length. 



This is in agreement with (|24| ) where it was found that short cycles are over-represented 
in a metabolic network, but large cycles occurred less frequently than one would expect 
given several different possible null models. 



3.5 Reliance on large (q) 

The difference (gap) between the two 7 curves also suggests a statistic to compare the 
circuits. Presumably, a large difference implies that the circuit relies on large stiffness (q) 
to regulate noise. Indeed, for large (q), the objective function L decreases, though this 
decrease is moderated by the value of 7 such that smaller 7 values allow larger (q) values. 
Stiff solutions have the advantage of allowing the reporter protein to effectively act as a 
low-pass filter, slowly sampling and responding to fluctuations in the circuit components. 
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2-cycles 



3-cycles 
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1 -cycles 



2-cycles 



3-cycles 



Figure 5: Bar graphs for (I) values for the two classes of circuits: odd (blue) includes 
circuits with cycles containing an odd number of regulators and even (green) includes 
circuits with cycles containing an even number of regulators. Top 71 = 0.001, middle 
72 = 0.01, and bottom — j 2 - F° r all 3 measures, there is a statistically significant 
difference between the two classes of circuits as calculated by the U Test (top p = 0.0002, 
middle p = 0.0003 and bottom p = 0.01). 
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A reliance on small values of 7 implies that the circuit has more difficulty regulating 
noise. We therefore expect the circuits with an odd number of negative regulators to have 
smaller gaps. Consistent with this prediction, while the average gap over all circuits was 
0.16 ± 0.05, the average gap for the negative feedback circuits was 0.13 ± 0.04, and for 
the rest it was 0.18 ± 0.05 (see Fig. [5]). The U Test using the gap measure gives U — 28 
and p = 0.01, indicating a statistically significant difference. 

Evidence from a database of transcription factors in prokaryotes supports the finding 
that circuits with negative feedback can suppress noise (1541 ) . In E. coli, many transcription 
factors do not undergo active degradation via proteolysis, but are instead only passively 
degraded via dilution. The half-lives of such proteins are on the order of the lifetime of the 
cell, allowing them to respond only slowly to fluctuations in the mRNA concentrations. 
As in the case of the stiff solutions with large (q) in our circuits, these slowly responding 
transcription factors have an advantage in noise control (1471 ). Therefore we might expect 
that transcription factors that do not undergo proteolysis will have no auto-repression, or 
even positive auto-regulation. On the other hand, transcription factors that do undergo 
proteolysis and cannot, therefore, filter mRNA fluctuations as well would be more likely 
to require negative auto-regulation. To test this hypothesis, we analyzed 145 transcription 
factors of the E. coli regulatory network (see supplemental material i n ([731 . For each 
transcription factor we correlated whether the factor is auto-repressive ( 1541 ) with whether 
it potentially undergoes proteolysis (by noting if the peptide sequence had any known 
cleavage sites) (1501 ) . We found that of the 13 transcription factors which are likely to 
undergo proteolysis, 9 are negative auto-repressors, and out of the 132 transcription factors 
which do not, 88 are not auto-repressors (see Tabled]). A Fisher exact probability test 
revealed a statistically significant positive association between proteolysis and negative 
feedback (p = 0.01). 





Negative Feedback 


No Negative Feedback 


Proteolysis 


9 


4 


No Proteolysis 


44 


88 



Table 1: Table comparing presence or absence of proteolysis to presence or absence of 
negative auto-regulation. Fisher exact probability test reveals signficant (p = 0.01) pos- 
itive association. This supports our prediction that transcription factors which undergo 
proteolysis, and therefore have faster response times, are less able to regulate noise using 
the slow response, filtering solution and require the presence of negative auto-regulation 
to help control their noise. 
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(a) (b) 

Figure 6: (a) The objective function L and (b) the mutual information / as a function of 
the first two input parameters (input 1 is sx and input 2 is sy) for circuit 2. The rest of 
the parameters are held constant for this figure. The five labeled peaks correspond to 5 
distinct behaviors or unique signal encodings (cf. Fig. Hand Table [2]). 



3.6 Robust, adaptive maxima 

An important consideration in further assessing the quality of our circuits is the extent to 
which these high information maxima are robust to perturbations in the system. Quali- 
tatively, we define a maximum as robust if, in its vicinity, the cost function L does not 
change significantly in response to perturbation of the parameters R, K, a, a , and s (see 
Sec. 12.21 for parameter definitions). Relatedly, we would also like to consider the ability 
of our circuits to adapt their behavior to such perturbations. An adaptive maximum 
does not significantly decrease its function value L in response to parameter perturba- 
tion, but does alter its behavior. Here we characterize the behavior of the circuit at each 
constrained mutual information maximum by ordering g c . That is, two maxima have 
different behaviors if the permutation yielding the sorted sequence of g c is different. 

As a preliminary investigation, we analyzed the functional L of circuit 2 near one of its 
randomly selected maximum. In addition to the original maximum, we found four other 
distinct nearby peaks as displayed in Fig. [61 The circuit alters its behavior as a result of 
changes along the 2 displayed dimensions (cf. Eq. [6]), input 1 (sx) and input 2 (sy), so 
that at each maximum the ordering of responses is distinct, and the signal is encoded in a 
different way (see Table [2]). Note that 4 of the maxima are separated by valleys no deeper 
than 2.3 bits. In other words, by a change in sx and sy only, the circuit can alter its 
behavior, while maintaining a high transmission fidelity. In this sense, we consider these 
maxima to be adaptive. 

We next numerically calculated the Hessian at each of the 5 peaks. In Fig. [7J we plot 
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Peak 1 

leak ^^p^J 




Mode Index Mode Index Mode Index 



Figure 7: Top-left: Spectra for the numerically calculated Hessian at each of the corre- 
sponding 5 peaks labeled in Fig. El Soft modes (— > 0) are directions in which L has small 
curvature, hard modes (— > — oo) are directions in which L has large curvature. Many 
eigendirections exhibit small curvature (greater than — 10~ 2 for peaks 2-4 and — 10 _1 for 
peaks 1 and 5), demonstrating that the maxima are robust to large deviations in param- 
eter space. Colored panels: Magnitude of contribution from each parameter to each 
eigenvector for each of the 5 Hessians. Mode index is sorted as in top-left figure (from least 
curvature to greatest curvature). Row labeled leak corresponds to parameter an. Paired 
rows labeled X, Y, Z, and G correspond to the two parameters, K and a, describing 
the gene regulation function for each transcription factor (X, Y, Z) and reporter protein 
(G). Rows labeled r correspond to the decay rates of each of the 3 transcription factors. 
Rows labeled s correspond to the input parameters modulating the three transcription 
factors. For all 5 peaks, the two most soft modes correspond to an and a mixture of Ky, 
ay, respectively, sx and sz contribute mostly to the hard modes. 
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Chemical State 


000 


001 


010 


Oil 


100 


101 


110 


111 


Peak 1 


2 


6 


1 


5 


4 


8 


3 


7 


Peak 2 


2 


6 


4 


1 


5 


8 


3 


7 


Peak 3 


2 


1 


4 


6 


3 


5 


8 


7 


Peak 4 


2 


1 


6 


4 


5 


3 


8 


7 


Peak 5 


6 


2 


5 


1 


8 


4 


7 


3 



Table 2: Table of behaviors corresponding to the five peaks shown in Fig. EJ Behavior is 
defined as the ordering of g c , where g c is the deterministic steady-state solution for given 
chemical input c and c e {000,001,010,011, 100, 101, 110, 111}. Each row describes the 
behavior of the circuit at one of the five maxima. 

the Hessian spectra along with the corresponding eigenvectors. By treating L as locally 
quadratic near each maximum we use the Hessian (evaluated with respect to log 10 of the 
parameters) to analyze how sensitive the maximum is to deviations in the parameters. For 
example, for an eigenvalue equal to —1, moving 10-fold in the corresponding eigendirection 
would result in a loss of 0.5 for the objective function. Alternatively, an eigenvalue equal 
to —0.1 means that we can move 10-fold in that direction while decreasing the objective 
function only by 0.05. This should be compared with the typical values near maxima of 
L, I ~ 2 bits. We find that for most directions for all 5 peaks eigenvalues are less than 
—0.1. In this sense, we consider these maxima to be robust. 

We can identify three different regimes for the spectra: an extremely "soft" regime 
corresponding to the first two modes, a second soft regime, where modes 3 to 9 are basically 
equivalent, and a third regime (modes 10 through 15), where the eigenvalues become more 
negative. We note that the spectra for peaks 1 and 5 overlap almost entirely, as do the 
ones for 2, 3 and 4, and that the latter appear to be more robust (eigenvalues are closer 
to 0). Interestingly, all five spectra in Fig. [7] are similar, due to the fact that the are 
themselves quite similar — that is, the maxima are closely arranged not just in the 2- 
dimensions displayed in Fig. [6], but over all 15 dimensions. This underscores the circuit's 
adaptability. 

In Fig. [3, we have also displayed the contributions from each parameter to each eigen- 
vector for all 5 peaks. It is clear that the first mode corresponds entirely to the leak 
parameter, which for all 5 peaks is being driven to as the optimization proceeds. The 
second mode is also consistent for all 5 peaks, and it corresponds to the parameters ay 
and Ky (cf. Sec. [6]), governing creation for the transcription factor Y . Essentially the 
range of the gene activation function, ay, is increased while the Michaelis constant Ky is 
decreased, so that Y is squeezed to low copy numbers. This is a reasonable strategy since 
maximizing the information in the output signal requires that most of the energy spent 
on building molecules is expended on the reporter protein. 

Another consistent finding for all of the peaks is that inputs 1 (sx) and 3 (sz) are 
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more critical (less robust) than input 2 (sy). That is, in general, sx and sz contribute 
to the highest 3 modes, whereas sy contributes mostly to modes in the "soft" regime. 
Qualitatively, this is consistent with Fig. [6j where most of the peaks are ellipses with 
major axes roughly in line with sy. Note that this is most evident for peaks 1 and 5 for 
which sx and sy correspond almost exclusively to modes 14 and 4, respectively. 

4 Discussion 

We have presented an information-theoretic, function-independent measure of circuit qual- 
ity. We have demonstrated that generic, small networks can transduce more information 
than a simple binary switch; moreover, such generic topologies can achieve close to opti- 
mal transmission fidelity, even under low copy number and fast response time (non-stiff) 
constraints. High information solutions can be robust to 10-fold deviations in parameters. 

That such simple, stochastic systems can act as high-fidelity signal transducers sug- 
gests a possible explanation for the cross-talk dilemma, in which multiple ligands trigger 
the same signaling pathway, and yet reliably produce distinct genetic outputs. We have 
demonstrated that multiple discrete input states can be transduced by the same signaling 
pathway and even the same molecule if the encoding is in molecule numbers. Moreover, 
when the trivial solutions to this problem are constrained (high copy number and slow 
response time) the input state can still be transduced reliably, implying that these simple 
circuits have enough flexibility to regulate noise. To our knowledge, this is the first demon- 
stration that a simple, stochastic system can overcome cross-talk, without invoking the 
traditional sequestering argument (J57j) (where signal specificity is achieved by spatially 
or temporally sequestering pathways with shared components). 

It may be possible to correlate some of the solutions of the circuits with experimental 
data to investigate to what extent is transduction optimality an essential goal of these 
systems. For example, a common generic solution of our circuits was to decrease the 
decay rate and increase the average molecule number of the reporter protein or the pro- 
teins that appear near the end of the circuit. The slower decay rate meant the reporter 
protein could temporally average the fluctuations of the proteins at the beginning of the 
circuit. Similarly, since the total number of molecules is constrained, it is best to expend 
energy building reporter molecules which need to encode the entire input signal, rather 
than wasting molecules on proteins in the beginning of the circuit. One well-known ex- 
ample of this is in the transcription-translation cascade from DNA to protein. Typically, 
mRNA degradation rates are faster than protein degaradation rates, and mRNA molecule 
numbers are smaller than protein molecule numbers. 

Other more subtle correlations may also be identified. Proteins that do not undergo 
proteolysis undergo decay by dilution; the characteristic time scale for these proteins is on 
the order of an hour. This implies that one can have noise filtering due to mRNA-protein 
scale mismatch without requiring negative feedback. One would expect to find a positive 
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association between proteolysis and negative feedback. Consistent with this prediction, a 
Fisher exact probability test revealed a significant positive association (p = 0.01) based on 
transcription factors from the E. coli gene regulatory network. Rosenfeld et al. (5l|) have 
argued that an auto-repressive circuit averts the need for proteolysis, since the negative 
auto- regulation shortens the rise time of the circuit. Closer analysis of transcription 
factor databases for other organisms may help distinguish between these two putative 
roles of auto-repression. Interestingly, in prokaryotes the percent of transcription factors 
undergoing proteolysis is less than in eukaryotes, and the percent of transcription factors 
which are auto-repressive is less than in eukaryotes. 

In their analysis of the phototransduction cascade, Detwiler et al. (llOh emphasize 
that the signal processing characteristics of the cascade can be tuned simply by altering 
the concentrations of proteins, rather than changing the genetic sequence. That is, the 
parameters of the system can be optimized on a time scale far shorter than evolution. 
So too, in our simple circuits, all of the kinetic constants can be regarded as functions of 
concentrations of proteins extrinsic to the circuit, meaning the parameters may also be 
tuned, even independently, on a time scale shorter than the response time of the system. 
We highlight that circuits supporting multiple, distinct maxima should be able to flip 
between behaviors in response to different stimuli, and that theoretically such adaptation 
can be as rapid as changes in protein concentration. Importantly, based on our findings, 
such adaptation can be smooth and still occur without significant loss in transduction 
capacity. 

The fact that the 5 peaks we analyzed collapsed onto two categories of spectra un- 
derscores a somewhat paradoxical finding. Namely, the maxima are robust in that they 
can withstand 10-fold perturbations in most of their parameters without significant loss 
in transmission fidelity, and yet they are adaptive in that the circuit can flip between the 
different maxima (and different behaviors), again without significant loss in transmission 
fidelity. Intuitively, one might expect a tradeoff between robustness and adaptability. 
Our findings suggest that the circuits can avert this tradeoff by clustering the maxima 
in a general region of high transmission fidelity. Certainly a closer and more quantita- 
tive analysis of this tradeoff is warranted. For example, it is now well-established that a 
single circuit can support multiple functions (169!) ■ m t ms vein, one interesting research 
direction would be to enumerate the functions that a particular circuit can achieve and 
quantify how easily the circuit can flip between these functions. Whereas our circuits 
can all be regarded as "optimal" in the sense that they can tune their parameters to 
transduce the optimal amount of input information, it is evident that subtle distinctions 
in information processing exist among them. Our setup is well-suited to systematically 
explore these distinctions (e.g., varying the input distribution, quantifying the mutual in- 
formation between time-varying input and output signals, and quantifying other statistics 
of the mutual information landscape rather than optimality). 
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Table 3: Table of mutual information / function 
of mean reporter copy number (Nq). Insets: (/) as a 
function of inverse data m (cf. 13.31) . Note that circuits 
5, 11, 13, 17, 19, 23 are shown in Figs. HH 



Circuit 1 Circuit 2 




Circuit 3 Circuit 4 
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Circuit 6 



Circuit 7 
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Circuit 10 



Circuit 12 




Circuit 14 Circuit 15 
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Circuit 16 Circuit 18 
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Circuit 22 



Circuit 24 
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Table 4: Table of Circuits (top 12). Extrapolated average mutual information over range 
of 25 to 120 molecules at 7 = 0.001 and 7 = 0.01. 
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Table 5: Table of Circuits (bottom 12). Extrapolated average mutual information over 
range of 25 to 120 molecules at 7 = 0.001 and 7 = 0.01. 



